Text binarization in color documents
Identifieur interne : 001000 ( Main/Exploration ); précédent : 000F99; suivant : 001001Text binarization in color documents
Auteurs : Efthimios Badekas [Grèce] ; Nikos Nikolaou [Grèce] ; Nikos Papamarkos [Grèce]Source :
- International Journal of Imaging Systems and Technology [ 0899-9457 ] ; 2006.
English descriptors
Abstract
This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray‐scale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006
Url:
DOI: 10.1002/ima.20092
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000224
- to stream Istex, to step Curation: 000221
- to stream Istex, to step Checkpoint: 000973
- to stream Main, to step Merge: 001017
- to stream Main, to step Curation: 001000
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Text binarization in color documents</title>
<author><name sortKey="Badekas, Efthimios" sort="Badekas, Efthimios" uniqKey="Badekas E" first="Efthimios" last="Badekas">Efthimios Badekas</name>
</author>
<author><name sortKey="Nikolaou, Nikos" sort="Nikolaou, Nikos" uniqKey="Nikolaou N" first="Nikos" last="Nikolaou">Nikos Nikolaou</name>
</author>
<author><name sortKey="Papamarkos, Nikos" sort="Papamarkos, Nikos" uniqKey="Papamarkos N" first="Nikos" last="Papamarkos">Nikos Papamarkos</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:259ED5DFC985771834D15F152D7C097A05F34BB5</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1002/ima.20092</idno>
<idno type="url">https://api.istex.fr/document/259ED5DFC985771834D15F152D7C097A05F34BB5/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000224</idno>
<idno type="wicri:Area/Istex/Curation">000221</idno>
<idno type="wicri:Area/Istex/Checkpoint">000973</idno>
<idno type="wicri:doubleKey">0899-9457:2006:Badekas E:text:binarization:in</idno>
<idno type="wicri:Area/Main/Merge">001017</idno>
<idno type="wicri:Area/Main/Curation">001000</idno>
<idno type="wicri:Area/Main/Exploration">001000</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Text binarization in color documents</title>
<author><name sortKey="Badekas, Efthimios" sort="Badekas, Efthimios" uniqKey="Badekas E" first="Efthimios" last="Badekas">Efthimios Badekas</name>
<affiliation wicri:level="1"><country xml:lang="fr">Grèce</country>
<wicri:regionArea>Department of Electrical and Computer Engineering, Image Processing and Multimedia Laboratory, Democritus University of Thrace, 67100 Xanthi</wicri:regionArea>
<wicri:noRegion>67100 Xanthi</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Nikolaou, Nikos" sort="Nikolaou, Nikos" uniqKey="Nikolaou N" first="Nikos" last="Nikolaou">Nikos Nikolaou</name>
<affiliation wicri:level="1"><country xml:lang="fr">Grèce</country>
<wicri:regionArea>Department of Electrical and Computer Engineering, Image Processing and Multimedia Laboratory, Democritus University of Thrace, 67100 Xanthi</wicri:regionArea>
<wicri:noRegion>67100 Xanthi</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Papamarkos, Nikos" sort="Papamarkos, Nikos" uniqKey="Papamarkos N" first="Nikos" last="Papamarkos">Nikos Papamarkos</name>
<affiliation wicri:level="1"><country xml:lang="fr">Grèce</country>
<wicri:regionArea>Department of Electrical and Computer Engineering, Image Processing and Multimedia Laboratory, Democritus University of Thrace, 67100 Xanthi</wicri:regionArea>
<wicri:noRegion>67100 Xanthi</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="ISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<imprint><publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2006">2006</date>
<biblScope unit="volume">16</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="262">262</biblScope>
<biblScope unit="page" to="274">274</biblScope>
</imprint>
<idno type="ISSN">0899-9457</idno>
</series>
<idno type="istex">259ED5DFC985771834D15F152D7C097A05F34BB5</idno>
<idno type="DOI">10.1002/ima.20092</idno>
<idno type="ArticleID">IMA20092</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0899-9457</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>binarization</term>
<term>color quantization</term>
<term>document processing</term>
<term>segmentation</term>
<term>text localization</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray‐scale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006</div>
</front>
</TEI>
<affiliations><list><country><li>Grèce</li>
</country>
</list>
<tree><country name="Grèce"><noRegion><name sortKey="Badekas, Efthimios" sort="Badekas, Efthimios" uniqKey="Badekas E" first="Efthimios" last="Badekas">Efthimios Badekas</name>
</noRegion>
<name sortKey="Nikolaou, Nikos" sort="Nikolaou, Nikos" uniqKey="Nikolaou N" first="Nikos" last="Nikolaou">Nikos Nikolaou</name>
<name sortKey="Papamarkos, Nikos" sort="Papamarkos, Nikos" uniqKey="Papamarkos N" first="Nikos" last="Papamarkos">Nikos Papamarkos</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001000 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001000 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:259ED5DFC985771834D15F152D7C097A05F34BB5 |texte= Text binarization in color documents }}
This area was generated with Dilib version V0.6.32. |